Multiple imputation: an alternative to top coding for statistical disclosure control

نویسندگان

  • Di An
  • Roderick J. A. Little
چکیده

Top coding of extreme values of variables like income is a common method of statistical disclosure control, but it creates problems for the data analyst. The paper proposes two alternative methods to top coding for statistical disclosure control that are based on multiple imputation. We show in simulation studies that the multiple-imputation methods provide better inferences of the publicly released data than top coding, using straightforward multiple-imputation methods of analysis, while maintaining good statistical disclosure control properties. We illustrate the methods on data from the 1995 Chinese household income project.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A multiple imputation approach to disclosure limitation for high-age individuals in longitudinal studies.

Disclosure limitation is an important consideration in the release of public use data sets. It is particularly challenging for longitudinal data sets, since information about an individual accumulates with repeated measures over time. Research on disclosure limitation methods for longitudinal data has been very limited. We consider here problems created by high ages in cohort studies. Because o...

متن کامل

Using Multiple Imputation Technique to Correct for Measurement Error and Statistical Disclosure Control in Sensitive Count Data in a National Survey

Measurement error in sensitive question is pervasive, therefore, biasing the estimation of most statistical models. The objective of this paper is to correct for measurement error in the number of life-time sexual partners by treating it as a missing data problem and using multiple imputation technique to synthesize this underlying “true” attribute. Bayesian Poisson model with diffuse Gaussian ...

متن کامل

Multiple Imputation for Disclosure Limitation: Future Research Challenges

Statistical agencies that disseminate data to the public are ethically and often legally required to protect the confidentiality of respondents’ identities and sensitive attributes. To satisfy these requirements, Rubin (1993), Little (1993), and Fienberg (1994) proposed that agencies utilize multiple imputation. For example, agencies can release the units originally surveyed with some values, s...

متن کامل

Combining synthetic data with subsampling to create public use microdata files for large scale surveys

To create public use files from large scale surveys, statistical agencies sometimes release random subsamples of the original records. Random subsampling reduces file sizes for secondary data analysts and reduces risks of unintended disclosures of survey participants’ confidential information. However, subsampling does not eliminate risks, so that alteration of the data is needed before dissemi...

متن کامل

Disclosure Control in Business Data - Experiences with Multiply Imputed Synthetic Datasets for the German IAB Establishment Survey

Generating synthetic datasets based on the ideas of multiple imputation is an innovative method for statistical disclosure control. The basic idea is to replace the values for some confidential variables X with several draws from the posterior predictive distribution of X given some non confidential variables Y. Since the synthetic values are based on models for the joint distribution of the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007